Large-scale analysis of formant frequency estimation variability in conversational telephone speech
نویسندگان
چکیده
We quantitatively investigate how the telephone channel and regional dialect might impact formant frequencies estimates extracted from tools commonly used in law enforcement. The telephone channel and regional dialect are important factors in forensic phonetics. In 90% of forensic cases, the speech sample in question is recorded after transmission via telephone (Byrne and Foulkes, 2004). In addition, qantifiable norms of dialect-dependent features are necessary for forensic examiners to assess if a given acoustic feature is speaker specific or commonly found in the speaker’s dialect (Rose 2002). Past studies have analyzed how the telephone channel and regional dialects might influence formant frequency estimates, but the number of speakers are often limited. (At most 20 subjects for channel studies (Byrne and Foulkes 2004; Kunzel 2001), and at most 439 speakers for American English dialects (Labov, Ash, and Boberg 2006).) To the best of our knowledge, our work is the largest scale study on these topics.
منابع مشابه
Acoustic variability in spontaneous conversational speech of american English talkers
Speaker variability strongly impacts human perception and technology performance, yet large-scale, systematic study of the acoustic characteristics involved is rarely undertaken. This study provides statistics on selected segmental and suprasegmental acoustic parameters from measures made on spontaneous conversational telephone speech from 160 speakers in the Switchboard Corpus. Since spontaneo...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کاملLarge Scale Mmie Training for Conversational Telephone Speech Recognition
This paper describes a lattice-based framework for maximum mutual information estimation (MMIE) of HMM parameters which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largest-scale application of discriminative training techniques for speech recognition of which the authors are aware, a...
متن کاملSpeaker and channel-normalized set of formant parameters for telephone speech recognition
The speech parameters, most commonly used nowadays, are Cepstral coefficients derived from FFT or LPC Spectrum. An alternative approach that can potentially provide maximum speaker and channel independence is estimation of articulatory based features such as formant frequencies, amplitudes and voicing degree. A present report describes a new method and algorithm of robust estimation of F1(t), F...
متن کاملAcoustic Features of Four Types of Laughter in Natural Conversational Speech
This paper presents the results of an analysis of the representative sounds of human laughter from a large corpus of naturally-occurring conversational speech. Two contrasting manners of laughter were categorized for the study: polite formal laughs and sincere mirthful laughs, and a formant analysis was performed on four phonetic classes of laugh therein. Laughing speech was also common in the ...
متن کامل